Goto

Collaborating Authors

 image region





Adaptively Aligned Image Captioning via Adaptive Attention Time

Neural Information Processing Systems

AATallowstheframeworktolearn howmany attention steps to take to output a caption word at each decoding step. With AAT, an image region can be mapped to an arbitrary number of caption words while a caption word can also attend to an arbitrary number of image regions. AAT is deterministic and differentiable, and doesn't introduce any noise to the parameter gradients.